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♦ ABSTRACT 



Database design for data warehouses is based on the notion of the snowflake schema and its 
important special case, the star schema. The snowflake schema represents a dimensional model 
which is composed of a central fact table and a set of constituent dimension tables which can be 
further broken up into subdimension tables. We formalise the concept of a snowflake schema in 
terms of an acyclic database schema whose join tree satisfies certain structural properties. We then 
define a normal form for snowflake schemas which captures its intuitive meaning with respect to a 
set of functional and inclusion dependencies. We show that snowflake schemas in this normal form 
are independent as well as separable when the relation schemas are pairwise incomparable. This 
implies that relations in the data warehouse can be updated independently of each other as long as 
referential integrity is maintained. In addition, we show that a data warehouse in snowflake normal 
form can be queried by joining the relation over the fact table with the relations over its dimension 
and subdimension tables. We also examine an information-theoretic interpretation of the snowflake 
schema and show that the redundancy of the primary key of the fact table is zero. 
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Data warehouses are databases devoted to analytical processing. They are used to 
support decision-making activities in most modern business settings, when complex data 
sets have to be studied and analyzed. The technology for analytical processing assumes 
that data are presented in the form of simple data marts, consisting of a well-identified 
collection of facts and data analysis dimensions (star schema). Despite the wide diffusion 
of data warehouse technology and concepts, we still miss me ... 
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A Data Warehouse (DW) has been an approach adopted for giving support to the process 
of taking decisions in an organization. This paper is concerned with the data warehouse 
conceptual schema design starting from the conceptual operational schemas and user 
requirements. We propose and illustrate an algorithm for automatic conceptual schema 
development. Our algorithm uses an enterprise schema represented with UMLas a 
starting point for source driven data warehouse schema design and produces a set ... 
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Abstract State Machines (ASMs) encourage high-level system specifications without 
forcing the development into the "formal methods straight-jacket". This makes them an 
ideal formal method for applications in areas, where otherwise only semi-formal methods 
are used. One such area is the development of data warehouse and on-line analytical 
processing (OLAP) applications to which this article contributes. Based on an ASM ground 
model for data warehouses we show which problems have to be solved in t ... 
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This paper challenges the currently popular "Data Warehouse is a Special Animal" 
philosophy and advocates that practitioners adopt a more conservative "Data 
Warehouse=Database" philosophy. The primary focus is the relevancy of Multi- 
Dimensional logical schemas. After enumerating the advantages of such schemas, a 
number of caveats to the presumed advantages are identified. The paper concludes with 
guidelines and commentary on implications for data warehouse design methodologies. 
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Data warehouses are dedicated to collecting heterogeneous and distributed data in order 
to perform decision analysis. Based on multidimensional model, OLAP commercial 
environments such as they are currently designed in traditional applications are used to 
provide means for the analysis of facts that are depicted by numeric data (e.g., sales 
depicted by amount or quantity sold). However, in numerous fields, like in medical or 
bioinformatics, multimedia data are used as valuable information in the ... 
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Data warehousing and on-line analytical processing (OLAP) are essential elements of 
decision support, which has increasingly become a focus of the database industry. Many 
commercial products and v services are now available, and all of the principal database 
management system vendors now have offerings in these areas. Decision support places 
some rather different requirements on database technology compared to traditional on- 
line transaction processing applications. This paper provides an overview ... 
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Data Warehouses are data-intensive systems that are used for analytical tasks. As these 
tasks do not depend on the latest updates by transactions, data warehouses can be set up 
in a way that input of data from operational databases and output to dialogue interfaces 
for on-line analytical processes (OLAP) can be separated. In the paper we describe how 
abstract state machines (ASMs) can be used to design distributed data warehouses. We 
formalise the ground idea of data warehouses by a ground model ... 
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Modeling data warehouses is a complex task focusing, very often, into internal structures 
and implementation issues. In this paper we argue that, in order to accurately reflect the 
users requirements into an error-free, understandable, and easily extendable data 
warehouse schema, special attention should be paid at the conceptual modeling phase. 
Based on a real mortgage business warehouse environment, we present a set of user 
modeling requirements and we discuss the involved concepts. Under ... 
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In large data warehousing environments, it is often advantageous to provide fast, 
approximate answers to complex decision support queries using precomputed summary 
statistics, such as samples. Decision support queries routinely segment the data into 
groups and then aggregate the information in each group (group-by queries). Depending 
on the data, there can be a wide disparity between the number of data items in each 
group. As a result, approximate answers based on uniform random sample ... 

14 Join synopses for app roximate query answering 
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In large data warehousing environments, it is often advantageous to provide fast, 
approximate answers to complex aggregate queries based on statistical summaries of the 
full data. In this paper, we demonstrate the difficulty of providing good approximate 
answers for join-queries using only statistics (in particular, samples) from the base 
relations. We propose join synopses as an effective solution for this problem and show 
how precomputing just one join synopsis 
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model management 
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Full text available: ^ pdf ( 4Q8.49 KB ) Additional Information: full citation , abstract , references 

Model management is an approach to simplify the programming of metadata-intensive 
applications. It offers developers powerful operators, such as Compose, Diff, and Merge, 
that are applied to models, such as database schemas or interface specifications, and to 
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mappings between models. Prior model management solutions focused on a simple class 
of mappings that do not have executable semantics. Yet many metadata applications 
require that mappings be executable, expressed in SQL, XSLT, or other data ... 
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Recent years have witnessed an increasing interest in designing algorithms for querying 
and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only 
limited memory. Providing (perhaps approximate) answers to queries over such 
continuous data streams is a crucial requirement for many application environments; 
examples include large telecom and IP network installations where performance data from 
different parts of the network needs to be continuously collected and a ... 
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Normalization as a way of producing good database designs is a well-understood topic. 
However, the same problem of distinguishing well-designed databases from poorly 
designed ones arises in other data models, in particular, XML While in the relational world 
the criteria for being well-designed are usually very intuitive and clear to state, they 
become more obscure when one moves to more complex data models. Our goal is to 
provide a set of tools for testing when a condition on a database design, ... 
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^ information-theoretic study of 3NF 
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A recently introduced information-theoretic approach to analyzing redundancies in 
database design was used to justify normal forms like BCNF that completely eliminate 
redundancies. The main notion is that of an information content of each datum in an 
instance (which is a number in [0,1]): the closer to 1, the less redundancy it carries. In 
practice, however, one usually settles for 3NF which, unlike BCNF, may not eliminate all 
redundancies but always guarantees dependency preservation. In this pa ... 
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V March 2005 Journal of the ACM ( JACM), Volume 52 issue 2 
Publisher: ACM Press 

Full text available: '{g pdf( 365.86 KB) Additional Information: full citation , abstract , references , index terms 
Normalization as a way of producing good relational database designs is a well- 
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understood topic. However, the same problem of distinguishing well-designed databases 
from poorly designed ones arises in other data models, in particular, XML While, in the 
relational world, the criteria for being well designed are usually very intuitive and clear to 
state, they become more obscure when one moves to more complex data models. Our 
goal is to provide a set of tools for testing when a condition on a data ... 

Keywords: Information theory, XML, design, normal forms, normalization algorithms, 
relational databases 
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